Declarative Visitors to Ease Fine-grained Source Code Mining with Full History on Billions of AST Nodes by Robert Dyer, Hridesh Rajan, and Tien N. Nguyen

نویسندگان

  • Robert Dyer
  • Hridesh Rajan
  • Tien N. Nguyen
چکیده

Software repositories contain a vast wealth of information about software development. Mining these repositories has proven useful for detecting patterns in software development, testing hypotheses for new software engineering approaches, etc. Specifically, mining source code has yielded significant insights into software development artifacts and processes. Unfortunately, mining source code at a large-scale remains a difficult task. Previous approaches had to either limit the scope of the projects studied, limit the scope of the mining task to be more coarse-grained, or sacrifice studying the history of the code due to both human and computational scalability issues. In this paper we address the substantial challenges of mining source code: a) at a very large scale; b) at a fine-grained level of detail; and c) with full history information. To address these challenges, we present domain-specific language features for source code mining. Our language features are inspired by object-oriented visitors and provide a default depthfirst traversal strategy along with two expressions for defining custom traversals. We provide an implementation of these features in the Boa infrastructure for software repository mining and describe a code generation strategy into Java code. To show the usability of our domain-specific language features, we reproduced over 40 source code mining tasks from two large-scale previous studies in just 2 person-weeks. The resulting code for these tasks show between 2.0x–4.8x reduction in code size. Finally we perform a small controlled experiment to gain insights into how easily mining tasks written using our language features can be understood, with no prior training. We show a substantial number of tasks (77%) were understood by study participants, in about 3 minutes per task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Large-scale Empirical Study of Java Language Feature Usage

Programming languages evolve over time, adding additional language features to simplify common tasks and make the language easier to use. For example, the Java Language Specification has four editions and is currently drafting a fifth. While the addition of language features is driven by an assumed need by the community (often with direct requests for such features), there is little empirical e...

متن کامل

Preserving Separation of Concerns Through Compilation

Current aspect-oriented (AO) compilation techniques fail to preserve the separation of concerns for postcompilation phases. At the minimum, it makes efficient incremental compilation and unit testing of AO programs challenging. The contribution of this work is an improved approach for aspect-oriented compilation. Our approach rests on a new interface between the AO high-level language (HLL) com...

متن کامل

Operation-Based, Fine-Grained Version Control Model for Tree-Based Representation

Existing version control systems are often based on text line-oriented models for change representation, which do not facilitate software developers in understanding code evolution. Other advanced change representation models that encompass more program semantics and structures are still not quite practical due to their high computational complexity. This paper presents OperV, a novel operation...

متن کامل

Generalizing AOP for Aspect-Oriented Testing

In profiling program execution to assess the adequacy of a test set, a challenge is to select the code to be included in the coverage assessment. Current mechanisms for doing this are coarse-grained, assume traditional concepts of modularity, require tedious and error-prone manual selection, and leave the tester’s intent implicit in the input provided to the testing tools. The aspect-oriented c...

متن کامل

On Accelerating Source Code Analysis At Massive Scale

Encouraged by the success of data-driven software engineering (SE) techniques that have found numerous applications e.g. in defect prediction, specification inference, etc, the demand for mining and analyzing source code repositories at scale has significantly increased. However, analyzing source code at scale remains expensive to the extent that data-driven solutions to certain SE problems are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013